Papers
Topics
Authors
Recent
Search
2000 character limit reached

Advancing sleep detection by modelling weak label sets: A novel weakly supervised learning approach

Published 27 Feb 2024 in cs.LG | (2402.17601v1)

Abstract: Understanding sleep and activity patterns plays a crucial role in physical and mental health. This study introduces a novel approach for sleep detection using weakly supervised learning for scenarios where reliable ground truth labels are unavailable. The proposed method relies on a set of weak labels, derived from the predictions generated by conventional sleep detection algorithms. Introducing a novel approach, we suggest a novel generalised non-linear statistical model in which the number of weak sleep labels is modelled as outcome of a binomial distribution. The probability of sleep in the binomial distribution is linked to the outcomes of neural networks trained to detect sleep based on actigraphy. We show that maximizing the likelihood function of the model, is equivalent to minimizing the soft cross-entropy loss. Additionally, we explored the use of the Brier score as a loss function for weak labels. The efficacy of the suggested modelling framework was demonstrated using the Multi-Ethnic Study of Atherosclerosis dataset. A \gls{lstm} trained on the soft cross-entropy outperformed conventional sleep detection algorithms, other neural network architectures and loss functions in accuracy and model calibration. This research not only advances sleep detection techniques in scenarios where ground truth data is scarce but also contributes to the broader field of weakly supervised learning by introducing innovative approach in modelling sets of weak labels.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Haghayegh, S., Khoshnevis, S., Smolensky, M.H., Diller, K.R., Castriotta, R.J.: Deep neural network sleep scoring using combined motion and heart rate variability data. Sensors 21(1), 25 (2020) Granovsky et al. [2018] Granovsky, L., Shalev, G., Yacovzada, N., Frank, Y., Fine, S.: Actigraphy-based sleep/wake pattern detection using convolutional neural networks. arXiv preprint arXiv:1802.07945 (2018) Barouni et al. [2020] Barouni, A., Ottenbacher, J., Schneider, J., Feige, B., Riemann, D., Herlan, A., El Hardouz, D., McLennan, D.: Ambulatory sleep scoring using accelerometers—distinguishing between nonwear and sleep/wake states. PeerJ 8, 8284 (2020) Sadeh [2011] Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Granovsky, L., Shalev, G., Yacovzada, N., Frank, Y., Fine, S.: Actigraphy-based sleep/wake pattern detection using convolutional neural networks. arXiv preprint arXiv:1802.07945 (2018) Barouni et al. [2020] Barouni, A., Ottenbacher, J., Schneider, J., Feige, B., Riemann, D., Herlan, A., El Hardouz, D., McLennan, D.: Ambulatory sleep scoring using accelerometers—distinguishing between nonwear and sleep/wake states. PeerJ 8, 8284 (2020) Sadeh [2011] Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Barouni, A., Ottenbacher, J., Schneider, J., Feige, B., Riemann, D., Herlan, A., El Hardouz, D., McLennan, D.: Ambulatory sleep scoring using accelerometers—distinguishing between nonwear and sleep/wake states. PeerJ 8, 8284 (2020) Sadeh [2011] Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  2. Granovsky, L., Shalev, G., Yacovzada, N., Frank, Y., Fine, S.: Actigraphy-based sleep/wake pattern detection using convolutional neural networks. arXiv preprint arXiv:1802.07945 (2018) Barouni et al. [2020] Barouni, A., Ottenbacher, J., Schneider, J., Feige, B., Riemann, D., Herlan, A., El Hardouz, D., McLennan, D.: Ambulatory sleep scoring using accelerometers—distinguishing between nonwear and sleep/wake states. PeerJ 8, 8284 (2020) Sadeh [2011] Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Barouni, A., Ottenbacher, J., Schneider, J., Feige, B., Riemann, D., Herlan, A., El Hardouz, D., McLennan, D.: Ambulatory sleep scoring using accelerometers—distinguishing between nonwear and sleep/wake states. PeerJ 8, 8284 (2020) Sadeh [2011] Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  3. Barouni, A., Ottenbacher, J., Schneider, J., Feige, B., Riemann, D., Herlan, A., El Hardouz, D., McLennan, D.: Ambulatory sleep scoring using accelerometers—distinguishing between nonwear and sleep/wake states. PeerJ 8, 8284 (2020) Sadeh [2011] Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  4. Sadeh, A.: The role and validity of actigraphy in sleep medicine: an update. Sleep medicine reviews 15(4), 259–267 (2011) Ancoli-Israel et al. [2003] Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  5. Ancoli-Israel, S., Cole, R., Alessi, C., Chambers, M., Moorcroft, W., Pollak, C.P.: The role of actigraphy in the study of sleep and circadian rhythms. Sleep 26(3), 342–392 (2003) Lam [2008] Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  6. Lam, R.: Addressing circadian rhythm disturbances in depressed patients. Journal of Psychopharmacology 22(7_suppl), 13–18 (2008) Grierson et al. [2016] Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  7. Grierson, A.B., Hickie, I.B., Naismith, S.L., Hermens, D.F., Scott, E.M., Scott, J.: Circadian rhythmicity in emerging mood disorders: state or trait marker? International journal of bipolar disorders 4(1), 1–7 (2016) Hori et al. [2016] Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  8. Hori, H., Koga, N., Hidese, S., Nagashima, A., Kim, Y., Higuchi, T., Kunugi, H.: 24-h activity rhythm and sleep in depressed outpatients. Journal of psychiatric research 77, 27–34 (2016) Difrancesco et al. [2022] Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  9. Difrancesco, S., Penninx, B.W., Riese, H., Giltay, E.J., Lamers, F.: The role of depressive symptoms and symptom dimensions in actigraphy-assessed sleep, circadian rhythm, and physical activity. Psychological Medicine 52(13), 2760–2766 (2022) Tahmasian et al. [2013] Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  10. Tahmasian, M., Khazaie, H., Golshani, S., Avis, K.T.: Clinical application of actigraphy in psychotic disorders: a systematic review. Current psychiatry reports 15, 1–15 (2013) Zhou [2018] Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  11. Zhou, Z.-H.: A brief introduction to weakly supervised learning. National science review 5(1), 44–53 (2018) Anderer et al. [2023] Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  12. Anderer, P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Fonseca, P.: Overview of the hypnodensity approach to scoring sleep for polysomnography and home sleep testing. Frontiers in Sleep 2, 1163477 (2023) Rundo and Downey III [2019] Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  13. Rundo, J.V., Downey III, R.: Polysomnography. Handbook of clinical neurology 160, 381–392 (2019) Penzel [2022] Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  14. Penzel, T.: Sleep scoring moving from visual scoring towards automated scoring. Oxford University Press US (2022) Sundararajan et al. [2021] Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  15. Sundararajan, K., Georgievska, S., Te Lindert, B.H., Gehrman, P.R., Ramautar, J., Mazzotti, D.R., Sabia, S., Weedon, M.N., Someren, E.J., Ridder, L., et al.: Sleep classification from wrist-worn accelerometer data using random forests. Scientific reports 11(1), 24 (2021) Oakley [1997] Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  16. Oakley, N.R.: Validation with polysomnography of the sleepwatch sleep/wake scoring algorithm used by the actiwatch activity monitoring system. mini mitter co. Sleep 2, 0–140 (1997) [18] Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  17. Actilife: Actilife - Detect Sleep Periods. https://actigraphcorp.my.site.com/support/s/article/What-does-the-Detect-Sleep-Periods-button-do-and-how-does-it-work. Accessed on October 30, 2023 Schoch et al. [2019] Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  18. Schoch, S.F., Jenni, O.G., Kohler, M., Kurth, S.: Actimetry in infant sleep research: an approach to facilitate comparability. Sleep 42(7), 083 (2019) Sadeh et al. [1994] Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  19. Sadeh, A., Sharkey, M., Carskadon, M.A.: Activity-based sleep-wake identification: an empirical test of methodological issues. Sleep 17(3), 201–207 (1994) Roger [1992] Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  20. Roger, J.: Automatic sleep/wake identification from wrist activity. Sleep 15(5), 461–469 (1992) Patterson et al. [2023] Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  21. Patterson, M.R., Nunes, A.A., Gerstel, D., Pilkar, R., Guthrie, T., Neishabouri, A., Guo, C.C.: 40 years of actigraphy in sleep medicine and current state of the art algorithms. NPJ Digital Medicine 6(1), 51 (2023) Sazonov et al. [2004] Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  22. Sazonov, E., Sazonova, N., Schuckers, S., Neuman, M., Group, C.S., et al.: Activity-based sleep–wake identification in infants. Physiological measurement 25(5), 1291 (2004) Gal and Ghahramani [2016] Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  23. Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: International Conference on Machine Learning, pp. 1050–1059 (2016). PMLR Chen et al. [2015] Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  24. Chen, X., Wang, R., Zee, P., Lutsey, P.L., Javaheri, S., Alcántara, C., Jackson, C.L., Williams, M.A., Redline, S.: Racial/ethnic differences in sleep disturbances: the multi-ethnic study of atherosclerosis (mesa). Sleep 38(6), 877–888 (2015) Donmez et al. [2010] Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  25. Donmez, P., Lebanon, G., Balasubramanian, K.: Unsupervised supervised learning i: Estimating classification and regression errors without labels. Journal of Machine Learning Research 11(4) (2010) Bonab and Can [2019] Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  26. Bonab, H., Can, F.: Less is more: A comprehensive framework for the number of components of ensemble classifiers. IEEE Transactions on neural networks and learning systems 30(9), 2735–2745 (2019) Kuncheva and Whitaker [2003] Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  27. Kuncheva, L.I., Whitaker, C.J.: Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy. Machine learning 51, 181–207 (2003) Li et al. [2020] Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  28. Li, X., Zhang, Y., Jiang, F., Zhao, H.: A novel machine learning unsupervised algorithm for sleep/wake identification using actigraphy. Chronobiology international 37(7), 1002–1015 (2020) Wang et al. [2020] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  29. Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science, 1–26 (2020) Sheng et al. [2008] Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  30. Sheng, V.S., Provost, F., Ipeirotis, P.G.: Get another label? improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 614–622 (2008) Gneiting and Raftery [2007] Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  31. Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association 102(477), 359–378 (2007) Brier [1950] Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  32. Brier, G.W.: Verification of forecasts expressed in terms of probability. Monthly weather review 78(1), 1–3 (1950) Kingma et al. [2015] Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  33. Kingma, D.P., Salimans, T., Welling, M.: Variational dropout and the local reparameterization trick. Advances in neural information processing systems 28 (2015) Giudici et al. [2013] Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  34. Giudici, P., Givens, G.H., Mallick, B.K.: Wiley Series in Computational Statistics vol. 596. Wiley Online Library, ??? (2013) Fox and Roberts [2012] Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  35. Fox, C.W., Roberts, S.J.: A tutorial on variational bayesian inference. Artificial intelligence review 38, 85–95 (2012) LeCun et al. [2015] LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  36. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. nature 521(7553), 436–444 (2015) Hochreiter and Schmidhuber [1997] Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  37. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural computation 9(8), 1735–1780 (1997) Ioffe and Szegedy [2015] Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  38. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015). pmlr Guo et al. [2017] Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  39. Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: International Conference on Machine Learning, pp. 1321–1330 (2017). PMLR Paszke et al. [2017] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  40. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017) Hicks et al. [2022] Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  41. Hicks, S.A., Strümke, I., Thambawita, V., Hammou, M., Riegler, M.A., Halvorsen, P., Parasa, S.: On evaluation metrics for medical applications of artificial intelligence. Scientific reports 12(1), 5979 (2022) [43] Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  42. Maier-Hein, L., Reinke, A., Godau, P., Tizabi, M., Büttner, F., Christodoulou, E., Glocker, B., Isensee, F., Kleesiek, J., Kozubek, M., et al.: Metrics reloaded: Recommendations for image analysis validation. arxiv 2023. arXiv preprint arXiv:2206.01653 Naeini et al. [2015] Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  43. Naeini, M.P., Cooper, G., Hauskrecht, M.: Obtaining well calibrated probabilities using bayesian binning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015) Shannon [1948] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  44. Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948) Braverman et al. [2020] Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  45. Braverman, M., Chen, X., Kakade, S., Narasimhan, K., Zhang, C., Zhang, Y.: Calibration, entropy rates, and memory in language models. In: International Conference on Machine Learning, pp. 1089–1099 (2020). PMLR Frénay and Verleysen [2013] Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  46. Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE transactions on neural networks and learning systems 25(5), 845–869 (2013) Der Kiureghian and Ditlevsen [2009] Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  47. Der Kiureghian, A., Ditlevsen, O.: Aleatory or epistemic? does it matter? Structural safety 31(2), 105–112 (2009) Seitzer et al. [2022] Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  48. Seitzer, M., Tavakoli, A., Antic, D., Martius, G.: On the pitfalls of heteroscedastic uncertainty estimation with probabilistic neural networks. arXiv preprint arXiv:2203.09168 (2022) Venkatesh and Thiagarajan [2019] Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  49. Venkatesh, B., Thiagarajan, J.J.: Heteroscedastic calibration of uncertainty estimators in deep learning. arXiv preprint arXiv:1910.14179 (2019) Zhang et al. [2018] Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  50. Zhang, G.-Q., Cui, L., Mueller, R., Tao, S., Kim, M., Rueschman, M., Mariani, S., Mobley, D., Redline, S.: The national sleep research resource: towards a sleep data commons. Journal of the American Medical Informatics Association 25(10), 1351–1358 (2018) Posocco and Bonnefoy [2021] Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  51. Posocco, N., Bonnefoy, A.: Estimating expected calibration errors. In: Artificial Neural Networks and Machine Learning–ICANN 2021: 30th International Conference on Artificial Neural Networks, Bratislava, Slovakia, September 14–17, 2021, Proceedings, Part IV 30, pp. 139–150 (2021). Springer Garcia-Ceja et al. [2018] Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  52. Garcia-Ceja, E., Riegler, M., Jakobsen, P., Tørresen, J., Nordgreen, T., Oedegaard, K.J., Fasmer, O.B.: Depresjon: a motor activity database of depression episodes in unipolar and bipolar patients. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 472–477 (2018) Jakobsen et al. [2020] Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  53. Jakobsen, P., Garcia-Ceja, E., Stabell, L.A., Oedegaard, K.J., Berle, J.O., Thambawita, V., Hicks, S.A., Halvorsen, P., Fasmer, O.B., Riegler, M.A.: Psykose: A motor activity database of patients with schizophrenia. In: 2020 IEEE 33rd International Symposium on Computer-Based Medical Systems (CBMS), pp. 303–308 (2020). IEEE Bakker et al. [2023] Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  54. Bakker, J.P., Ross, M., Cerny, A., Vasko, R., Shaw, E., Kuna, S., Magalang, U.J., Punjabi, N.M., Anderer, P.: Scoring sleep with artificial intelligence enables quantification of sleep stage ambiguity: hypnodensity based on multiple expert scorers and auto-scoring. Sleep 46(2), 154 (2023) Zheng et al. [2021] Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  55. Zheng, R., Zhang, S., Liu, L., Luo, Y., Sun, M.: Uncertainty in bayesian deep label distribution learning. Applied Soft Computing 101, 107046 (2021) Reamaroon et al. [2018] Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  56. Reamaroon, N., Sjoding, M.W., Lin, K., Iwashyna, T.J., Najarian, K.: Accounting for label uncertainty in machine learning for detection of acute respiratory distress syndrome. IEEE journal of biomedical and health informatics 23(1), 407–415 (2018) Ju et al. [2022] Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  57. Ju, L., Wang, X., Wang, L., Mahapatra, D., Zhao, X., Zhou, Q., Liu, T., Ge, Z.: Improving medical images classification with label noise using dual-uncertainty estimation. IEEE transactions on medical imaging 41(6), 1533–1546 (2022) Boeker [2023] Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023) Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)
  58. Boeker, M.: Code for ”A Novel Approach for Sleep Detection Using Weakly Supervised Learning”. GitHub. https://github.com/matthiasboeker/ensemble_sleep_detection, Last accessed on 15th December 2023 (2023)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.